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Abstract 



We consider a random tree and introduce a metric in the space of trees to define 
the "mean tree" as the tree minimizing the average distance to the random tree. 
When the resulting metric space is compact we have laws of large numbers and 
central limit theorems for sequence of independent identically distributed random 
trees. As application we propose tests to check if two samples of random trees have 
the same law. 



1 Introduction 

Random trees have long been an important modelling tool. In particular, trees are useful 
when a collection of observed objects are all descended from a common ancestral object 
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via a process of duplication followed by gradual differentiation. This characterizes the 
process of natural evolution, and also any form of information that over time is successively 
replicated, and transmitted with occasional error. There are two broad approaches to 
constructing random evolutionary trees: forwards in time "branching process" models, 
such as the Galton- Watson process, and backwards-in-time "coalescent" models such as 
Kingman's coalescent (Kingman, 1982). 

We prove law of large numbers and an invariance principle for random trees defined in 
a metric space and propose a Kolmogorov-Smirnov-type goodness-of-fit test. 

Our trees have a special vertex called root and evolve forward in time in discrete gen- 
erations; each parent node (or vertex) has up to m offspring nodes in the next generation. 
The set of possible vertices is called V. A tree is a function x : V — > {0, 1}, where x(v) 
indicates if the vertex v G V is present in x, with the restriction that a vertex cannot be 
present if its mother is not. Call T the resulting space of trees; when V is finite, T is also 
finite. In the general case, T is a closed subset of the compact product space {0, 1} V . Since 
the product topology is the one where convergence is in each coordinate, the topology may 
be induced by different distances. In this setting £>, the Borel a-field, is the same as the 
one generated by the projections. A similar setup was proposed by Otter (1949) and Neveu 
(1986), see Kurata and Minami (2004). For a probability measure v on T, a random tree 
with law v and a distance d on T, the <i-mean related to v is defined as the tree (or set of 
trees) that minimizes the z/-average <i-distance to the random tree. Other tree spaces and 
metrics are briefly discussed in Section [7J 

We consider a sample of independent and identically distributed random elements of a 
compact metric space with law v and a unique <i-mean. We prove that the empiric d-mean 
of the sample converges to the (i-mean related to v as the size of the sample goes to infinity. 
Hence the empiric <i-mean is a consistent estimator for the (i-mean related to v. The result 
applies to metric spaces of trees that may have infinitely many vertices. The law of large 
numbers on metric spaces with negative curvature has been addressed by Herer (1992), de 
Fitte (1997) and Es-Sahib and Heinich (1999). Our space is not of negative curvature, as 
shown in Section [7J For compact metric spaces, a strong law of large numbers have been 
obtained in Sverdrup-Thygenson (1981), which is used in our setting. 

We show an invariance principle for the random processes (g n {y) — g{y), V £ T), where 
g n (y) is the average of the distances from y to the points of a sample of size n and g(y) is 
the average of the distances from y to the random tree with law v from where the sample is 
obtained. The proof is based on a theorem by Ledoux and Talagrand (1991); we build up a 
probability measure on the space of trees that satisfies the "majorizing measure condition" 
for a particular family of distances. 

The invariance principle implies the approximate distribution of 

max\g n (y) - g(y)\, (1.1) 

is known. We propose f 1 1.1 1) as statistic for a universal Kolmogorov-type goodness of fit 



test and the analogous for the two-sample problem. In general {g n {y) — g{y), V E T) does 
not identify the measure v. Busch et al (2006) show that (g n (y) — g(y), y E T) identifies 
the vertex-marginals (u{x : x(v) — 1}, v E V) and viceversa. The vertex-marginals do 
not always identify the measure but they do if the tree is constructed in a Markovian way; 
examples include Galton- Watson and other related processes. 

As far as we know the Otter-Neveu set-up has not been used before to construct sta- 
tistical tools for random trees. With this structure the law of large numbers and invari- 
ance principles are quite straightforward and the statistic (11.11) arises naturally to perform 
goodness-to-fit tests. The computation of the statistic (II. ip requires in principle an expo- 
nential number of steps in the number of possible vertices. Busch et al (2006) show that 
the search of the maximum in (11 .ip is equivalent to the search of the minimal cut in an 
associated network; a technique coming from image reconstruction. This makes the test 
viable for reasonable big trees. 

The critical values related to the statistic ( 11.11) depend on the distribution v. To 
compute them it is usually necessary to simulate trees with the tested distribution or 
to perform bootstrap. Our test has been applied to samples of Galton- Watson related 
processes obtained by simulation and to a classification of FGF protein families (Busch 
et al 2006). In both cases the test has been successfull to distinguish different laws, even 
when the mean tree is the same for the two samples. 

In Section [2] we introduce the space of trees as a metric space and define the d-mean 
tree. In Section [3] we prove the law of large numbers. In Section H] we give some examples 
and in Section [5] we prove the invariance principle. In Section [6] we describe the statistical 
applications. In Section [7] we show that our space is not of negative curvature and discuss 
some other possible metrics. 



2 A metric space of rooted trees 

Let V = {1, 11, 12, ... , lm, . . .} the set of finite sequences of numbers in A = {1, . . . , m} 
starting with 1, with m a natural number. Elements of V are called vertices; the vertex 1 
is called root. The full tree is the oriented graph x = (V, E) with edges E C V x V given 
by E = {(v, va) : v E V, a E A}, where va is the sequence obtained by juxtaposition of v 
and a. In the full tree each node or vertex has exactly m outgoing edges to her offsprings 
and one ingoing edge from her mother, except for the root that has no ingoing edges. The 
node v = ax ... a^ is said to belong to the generation k; in this case we write gen(u) = k. 
Generation 1 has only one node: the root of the tree. 

We define a tree as a function x : V — > {0, 1} satisfying, for all v E V and a E A, 



x(v) > x(va). (2.1) 



Abusing notation, we identify x with the graph x = (V x , E x ) where 

V x = {veV : x(v) = 1}, (2.2) 

E x = {(v, va)eE : x(v) = x(va) = 1} . (2.3) 

Let T be the set of trees of this form. Condition (12.11) in effect requires that for x G T, 
every node in x must have a parent node in each previous generation back to the root. 

A finite tree is characterized by the set of its terminal nodes. For example, the trees in 
Figure [T]are (a) {111,12}, and (b) {11,121}. 

(a) (b) 





11 111 11 

Figure 1: Two finite trees both with 3 generations and 2 terminal nodes. 

The product topology on {0, 1}^ is the smaller for which the projections are continuous. 
By projection we mean the family of functions n v : {0, 1} V — > {0, 1} that map x — > x(v). 
In this topology x n converges to x if and only if x n (v) converges to x(v) for all v G V. 
Since at each vertex we have values in {0,1}, convergence means that for each v there 
exists n(v) such that if n > n(v) then x n (v) = x(v). This condition guarantees that T, 
the space of trees, is a closed set in {0, 1} V and hence also compact. 

As it is done in interacting particle systems (see Liggett (1985)) we consider the sigma 
algebra B generated by the cylinders {x G T : x(v ) = 1}, v G V; this is just the Borel 
sigma field generated by the product topology. 

We provide T with a distance d, so that (T, d) is a metric space. We use the family of 
distances in T defined by 

d(x,y) = ^2\x(v)-y(v)\(f>{v), (2.4) 

vev 

for some strictly positive function <fi : V —> 1R + satisfying Y2 v ev < ^( 1; ) < °° - ^ n ^ ms case 5 
the distance between the two trees of Figured] is d(a, b) = 0(111) + 0(121). 

This distance is compatible with the product topology and hence, the notion of con- 
vergence under any of these metrics is the same as the induced by the product topology. 



Otter (1949) and Neveu (1986) propose a similar construction, but to deal with un- 
bounded number of offsprings, they ask each vertex v and natural a to satisfy x(v(a + l)) < 
x(va); informally, the presence of a brother in the tree implies that all older brothers are 
also present. Their distance, also compatible with the product topology, is defined by 

doN(x,t) = exp ( — max{£; : x(v) = t(v) for all v such that gen(w) < k}). (2.5) 

See Kurata and Minami (2004) for a review of those papers. 

Random trees A random tree with distribution v is a measurable function 

T :VL^T such that P(T E A) = v(dx) . (2.6) 

J A 

for any Borel set A G £>, where (Q, J-, P) is a probability space and v a probability on 
(T,S). 

The expected distance from a tree y to a random tree T is defined by 

g(y) := E(d(T,y)) = J d(x,y) u(dx) (2.7) 

= y j v{x)d{x,y) (in the discrete case). (2.8) 

xeT 

Definition 2.1 The expected value or d-mean of a random tree T is the set (of trees) E^T 
that minimizes the expected distance to T: 

E d T := argming^). (2.9) 

The set E^T might be empty, but if T is compact, then E^T is not empty (see Section 
[3]). Any element of the set E^T is also called a d-mean. Since E^T depends only on the 
distribution v induced by T on T, it may also be denoted as E^(z/). The elements of 
Ed(z/) are also called ^-centers. The notion of expected value depends on the distance d; 
in particular, for random variables in IR fc we may obtain the usual mean, the median and 
the mode as illustrated in Section HI 

Example: In the Galton- Watson branching process the numbers of offspring of distinct 
nodes are i.i.d. In the special case that they have the Binomial(2,p) distribution with 
p G [0,1], the offspring number is 0, 1, or 2 with probabilities (1— p) 2 , 2p(l—p), and p 2 . 
Letting /c = max{fc G {0, 1, . . .} : p k > 1/2} there are two cases: (a) if p k ° > 1/2 there is 
only one mean tree x satisfying x(v) = 1 if and only if gen(w) < fco and (b) if p ko = 1/2 
the mean tree is the set of trees with x(v) = 1 if gen(-u) < fco, x(v) G {0, 1} if gen(w) = k 
and x(v) = if gen(-y) > k$. In particular, if p < 1/2 the mean tree is the empty tree. 



Let Ti, . . . , T n be a random sample of T (independent random trees with the same law 
as T). The empiric measure associated to the sample is denoted by fi n and it is given by 

1 n 1 n 

where 5 X is the point mass at x and 1a is the indicator function of the set A. Associated 
to this measures we define the empiric expected distance of a tree y to the sample by 



9n(y) ■-- 



f 1 - 

d{x,y) Hn{dx) = -yd(T u y) , (2.11) 

IT n — 

J 1 i=l 



and as in ( 12.9ft the empiric mean tree (empiric d-center, sample cZ-mean) as the random set 

given by 

T n :=axgminflf B (j/). (2.12) 

y eT 

The empirical mean is like a consensus tree. If n is odd, the empirical mean is unique; 
it just includes all vertices that are in more than half of the trees. If n is even, it is not 
unique but there is a "shortest" and "largest" empirical mean tree, and every subtree of 
the largest empirical mean tree which contains the shortest empirical mean tree is on the 
set of empirical <i-means. This is a nice property from the robustness point of view. 



3 Law of large numbers 

If v is defined on a finite set of trees the following law of large numbers follows immediately. 



Theorem 3.1 Let (T , d) be a finite tree space with metric d. Let T G T be a random tree 
with law v such that E^T has only one element (also denoted by E^Tj. Let {T n , n > 1} 
be an i.i.d. sequence of random trees with law v. If y n is any of the empiric mean trees of 
{Ti, . . . , T n }, that is y n G T n , then 

lim d{y n ,E d T) = a.s. . (3.1) 

n— >oo 

In other words, the set of empiric mean trees coincides with the singleton of the d-mean if 
n is large enough. 

When v is an arbitrary probability measure on T, it may give positive mass to sets of 
trees with infinitely many nodes. First we state the strong law of large numbers for random 
elements taking values in a compact metric space given in Sverdrup-Thygeson (1981). This 
covers the space of trees with infinite number of vertices. Then we show that the metric 



space T is compact; this implies in particular that the expected tree is well defined (E^(T) 
is non empty). 

Consider a compact metric space (/C, d). Let B denote the u-field generated by the open 
sets, and so the elements of B are the Borel sets. Let v be a probability measure on B. We 
define the expected value with respect to the measure v and the distance d following the 
ideas developed in the previous section. Let g: K, — > R + be given by 

g(y) '■= / d(y,x) v{dx). (3.2) 

JK 

Since 

\g(y) -g{t)\< [ \d{y, x) - d{t, x) \ v{dx) < f d{y, t)v{dx) = d{y, t) , (3.3) 

JK JK 

we get that g is Lipschitz continuous. Since it is defined on a compact space, it attains its 
minimum. This shows that the d-me&n set E,^) defined as in ( 12.91) is non empty. The 
empiric mean T n is defined as in (12.121) . 

Theorem 3.2 (Sverdrup-Thygeson, 1981) Let v be a probability on the compact met- 
ric space (JC,d) such that E^(z/) has only one point. Consider {T n : n > 1}, an i.i.d. 
sample for v. Then, the empirical d-centers converge uniformly to E^(z/) almost surely: 

lim sup d(a,~E d (v) ) =0 a.s. . (3.4) 

a&T n 

The results of this section can be extended to the following family of functions g p defined 
for p > 1 by 



g P (y) = / d (vi x Y v( dx ) ■ 



K 



4 Examples 

Mode parameter Consider a finite space K, with the discrete distance given by 

i i£x = y, Ul) 

1 otherwise. v ' ' 



In this case, 



g(x) = / d(x, y) v{dy) = V u(y) = 1 - v{x) . (4.2) 



So, the d-center parameter for (/C, d, v) is just the mode of v. 



Mean and median parameters Consider /C = [0, l] n C M n , and d(x,y) = \\x — y\\ p . Let 
v be any probability measure on /C. Then, if p = 2 we have that the d-center parameter 
is the usual expected value. For n = 1 and p = 1 we get the median, and for n > 1 the 
spatial median or multivariate L\ —median^ see for instance Haldane (1948) and Milasevic 
and Ducharme (1987). 

Product Space We say that (/C, d, v) is a centered space if it has a unique d-center. We 
now prove that the product of a finite number of centered spaces is a centered space. 

Lemma 4.1 Let (/Q, d iy v^) be spaces with unique d-centers Ci = E di {y^), for i = 1,2. 
Then, if we consider the product space K = K\ x /C 2 with 

d(x, y) = d 1 {x 1 , y x ) + d 2 (x 2 , y 2 ) , (4.3) 

for x = (xi, x 2 ) G /C and the product measures v — v\ X v 2 , we get that (/C, d, u) has also a 
unique d-center (Ci,C 2 ). 

Proof We need to prove that (Ci, C 2 ) is the unique point minimizing g : K — ► R. We get 
g{x)= / ^i(xi,yi) + d 2 (x 2 ,y 2 ))v 2 (dy 2 )v 1 (dy 1 ) = gi(xi) +5-2(^2) , (4.4) 



where 



gi (x) = di(x,y)ui(dy). (4.5) 



from where the result follows. □ 



5 Invariance Principle 

In this section we consider a sequence of independent identically distributed random trees 
(Ti, . . . , T n ) with empiric mean g n {t) given by (12.111) and prove an invariance principle for 
the centered process 

(Vn~(g n (t)-g(t)),teT), 

as n — > 00. The main tool is the following general result. 

Theorem 5.1 (Ledoux and Talagrand (1991) pag 395—396) Let T be a compact met- 
ric space and C(T) be the separable Banach space of continuous functions on T with the 
sup norm. Let (O, J 7 , P) be a probability space and X : Q — > C(T) be a random element of 
C(T) with KX(t) = and EX(t) 2 < 00 for all t in T . Assume that X is Lipschitz, that 
is, there exists a positive random variable M with KM 2 < 00 such that 

\X(uj,s)-X(uj,t)\ < M(uj)d{s,t), (5.1) 



for all oj G Vt, s,t eT. Assume there exists a probability measure fi on (T,d) such that 

f s r l 1 / 2 

limsup / -log[fi(B(t,u))] du = 0, (5.2) 

s ^° ter Jo L -I 

where B(t,u) is the ball centered at t with radius u. (This is called the majorizing measure 
condition for (T,d).) Then X verifies the Central Limit Theorem in C{T). That is, if 
Xi, . . . ,X n are i.i.d. with the same law as X, then n~ l ^ 2 {Xi + . . . + X n ) converges to a 
Gaussian process with mean zero and the same covariance function as X. 

The majorizing measure condition If T is finite, the condition is satisfied auto- 
matically by any measure /i on T giving positive mass to all elements of T . Indeed, 
fx(B(t,u)) > /i(t) > and the integral in ( 15.21) is dominated by [— log^i))] 1 / 2 ^. 

Lemma 5.2 Let T be the set of trees. Let < z < m~ 3 / 2 and <p defined by 

(j)( v ) = z ^(-) . (5.3) 

Then the majorizing measure condition is satisfied for (T, d) with the distance defined by 
\2.J$ and this </>. 



Proof Since for finite trees the result follows, we assume the trees in T have infinitely 
many generations. Define the cylinder of generation k induced by the tree t G T by 

%{t) := {seT : s(v) = t(v) if gen(w) < k} . (5.4) 

Define for u > 

k{u) = k(u,(f>) := inf < k : } 0(i>)l{gen(i>) > k} < u |- . 

V 

Since J2 V ( t ) ( v ) < °°> M M ) § oes to oo as m goes to 0. Since 

V <j)(v)l{gen(v) >k} = Y rrj^z 1 = ? mZ ' , (5.5) 



i>k 



we can write 



We have 



k(u) = inf {k : z{mz) k /(I — mz) < u] . (5-6) 



% {u) {t) C B(t,u) . (5.7) 



A natural choice for a majorizing measure in T is the measure induced by the product 
measure v p on {0, 1} V with marginals v p {£, '■ £(v) = 1} = p, for v G V. Given a con- 
figuration £ G {0, 1} V ', define a?(£) as the maximal tree from the root whose vertices are 
contained in the set £. In other words, inductively, x(£)(l) = £(1) and 

9 



r (n(vn) ■ ■ / l lf X ^^ = 1 aild ^ m "> = 1 (5 81 

x^){va) .- | Q otherwise; W 

for each i; G {0, l} y and a G A Define the measure /x p induced on T by this application: 

H P (B) := z/ p {£ : x(£) G £?} . 

To check that \x p is a majorizing measure, let (3 > be defined by e _/3 = min{p, 1 — p}. 
The number of vertices in the first k generations of the full tree is (m k — l)/(m — 1) < 2m k . 
Hence the probability of any cylinder with k generations is bigger than e~ 2,3rn : 

H P (%(t)) > u p [i G {0, if : £(v) = t{v) if gen(t;) < k] > e" 2 ^ . (5.9) 

uniformly in t. This and (J5.7P imply that the supremum of the integral in (15.21) is bounded 
above by 

'\20) 1 ' 2 m k W 2 du = f{2f3) 1,2 e k{u) ^ (ml/2) du < (2/3) 1 / 2 f ^du, (5.10) 



./0 ./0 



» 



for 5 small enough, if there exists an e > such that fc(w) < — (logw)(l — e:)/log(?7i 1 / 2 ), 
for u small enough. In this case the proof is finished because for e > (1 5 . 1 [) converges to 
zero as 5 — > 0. Call 7 = (1 — e)/log(rra 1 / 2 ). In view of (15. 5p . we look for 7 > such that 
(mz)~' ylogu < u(l — mz)/z. That is, 

u - 7 Iog(nur)-l < l ~ mZ _ 

z 

For u sufficiently small it suffices that — 7log(m2:) — 1 > and z < m" 1 . Substituting 7 
and noticing that log(m2) < 0, we need to find an e > such that 

-(l-e)< T ^ 7 A that is, e<l' 



log(m,z) log(mz) 

which exists since z < m~ 3 ' 2 . D 

We are now able to obtain the asymptotic distribution of the process 

Theorem 5.3 Let T be the set of trees with at most m offspring. Consider the distance 
given in \2-l$ for 4>{v) = z gcn ^ with < z < m~ 3 / 2 . Let {Tj : i > 1} be a sequence of 
i.i.d. random trees on T with the same law as T. Then the process (\/n(g n (t) — g(t)),t G 
T) converges weakly as n — > 00 to a Gaussian process W with zero mean and the same 
covariance function as the process X G (M + ) r defined by X(t) = d(T,t) — E(d(T, t)). 

Proof Since | X(u,t) — X(u),t')\ < 2d(t,t') the result follows from the previous Lemma 
and Theorem (15.11) . □ 

10 



6 Statistical applications 

Let T be a random tree in T with distribution v and mean distances {g{y), y ET) defined 
in (12.71) . Let z/ be a distribution on the tree space T with mean distances {go{y), y E T). 
The goal is to test 

HO: v = u 
HA: v ^ Vq 

using an i.i.d. sample of random trees {Tj : i > 1}. Notice however that the rejection of 
HO does not imply the rejection of ET = E^(z/ ). 

To perform the test we propose the statistic 

sup\W n (y)\ = sup sfn \ g n (y) - g (y)) |, (6.1) 

yeT yeT 

whose asymptotic law under HO is obtained from Theorem (15.31) and the Continuous Map- 
ping Theorem. We reject the null hypothesis at level a if 

sup \W n (y)\ > q a , 
y eT 

where q a satisfies P(sup yGr |VT(y)| > q a ) = a, for W given in Theorem (15.31) under v = z/ . 

The test rejects v = u if g determines v unequivocally. 

In practice the distribution of sup yeT |W(?/)| depends on the covariance of the process 
X(t) = d(T,t) — K(d(T, t)) which in general is unknown. A possible way to deal with this 
problem is to approximate q a using bootstrap. The validity of the bootstrap in this context 
remains an open problem. Alternatively, one can simulate trees with distribution uq and 
estimate q a . 

For the problem of two samples (of same size, for instance) one may use the statistic 

svp Vn\g n {y) - g' n {y)\ , (6.2) 

y eT 

where g n and g' n correspond to the samples of T and T' respectively. 

When g characterizes the measure v! Busch et al (2006) prove that g = (g(t),t E T) 
characterizes the vertex-marginal distributions as follows. Let v and v' be two measures 
in T and g, g' be the corresponding processes. Then g — g' if and only if v{t : t(v) = 
1} = v'\t : t(v) = 1} for all vertex v. In that paper it is proven that under certain Markov 
hypothesis, the vertex-marginals identify univoquely the measure. The class of random 
trees satisfying those hypothesis includes Galton- Watson branching processes and other 
related processes. 



11 



7 Metrics and negative curvature 

In this section we show that our tree space cannot be embedded in a metric space of non 
positive curvature. Then we discuss other possible metrics that have been considered for 
spaces of trees. A natural way of embedding the discrete tree space T in a continuous 
space would be to consider a tree as a function x : V — > M + (instead of {0, 1}), where the 
value x(v) would represent the length of the edge connecting the node v to her mother. 
The value x(v) = means that the node v is not present. The metric could be the one 
given in (12.4p which coincides with the previous one for trees with unitary edge lengths. A 
tree condition like u x{ya) > implies x(v) > 0" is also needed, but other conditions could 
be proposed. For instance one could collapse the vertices with x(v) =0 but in this case 
the trees would not have a limited number of offspring nor the vertex notation introduced 
in Section [1] would be appropriate. 

Let a, b, c, x be arbitrary distinct points in a metric space T such that x belongs to a 
geodesic from a to b, that is, d(a,b) = d(a,x) + d(x,b). Let a',b',c',x' in R 2 be points 
located in such a way that the relative distances are the same, that is, d(a, b) = d'(a', b'), 
d(a,c) = d'(a',c'), etc, where d! is the Euclidean distance in M. 2 . It is said that T is of 
non positive curvature if d(x, c) < d'(x', c') for any choice of a, b, c, x. These spaces are also 
called CAT(O), see Billera, Holmes and Vogtmann (2001), page 750. 

We now give an example showing that our space cannot fit the above property. Let 
a = {111,12} and b = {11,121} be the trees in Figure 1 and x = {11,12} and c = 
{111, 112, 121, 122} those of Figure 2. Consider the distance (12.41) with <j)(v) depending only 
on the generation of v, so that 0(111) = 0(121) = 0(122) = 0(112) = a, for some a > 0. 
The tree x belongs to a geodesic between a and b: d(a, x) = d(b, x) = a, d(a, b) = 2a. 
On the other hand d(a, c) = d(b, c) = 3a and d(x, c) = 4a. Consider the corresponding 
Euclidean triangle (a', b', c') with the same relative distances. The point equidistant from 
a' and b' in the Euclidian geodesic, corresponding to x, is x' = (a' + b')/2. Since d'(x', d) = 
\/Sa < Aa = d(x, c), our tree space cannot be embedded in a CAT(0) space. 



(x) (c) ^^122 
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Figure 2: The trees x = {11, 12} and c= {111,112,121,122}. 
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Notice that the tree x := (111, 121) is also in a (different) geodesic between a and b. 
However d(x,c) = 2a < \^8a = d(x,c). In fact the triangle (a',b',c') is the same and 
x! = x'. Actually, the fact that there are two geodesies going from a to b indicates that 
the space cannot be of negative curvature. 

The metric we propose for the space of trees is usual in interacting particle systems, 
some of which are defined in {0, 1} S for countable S, for example. The product of the 
discrete topologies induces a metric like (12.4R . Under this metric, the convergence of a 
sequence x n to x is equivalent to the convergence of x n (v) to x(v) for all vertex v. Valiente 
(2001) considers spaces of finite trees with ordered vertices, reviews several distances and 
proposes a new metric. An important example is the so called "edit distance", which 
counts the number of operations (eliminate a vertex, add a vertex) that need to be done in 
order to transform one tree into another one. Critchlow (1980) proposes some metrics in 
the set of permutations of a finite sequence that may be adapted to a finite space of trees. 
It would be nice to understand if our results can be proven in those spaces. 

Billera, Holmes and Vogtmann (2001) describe various spaces of "phylogenetic trees" 
and construct a (continuous) convex metric space of trees with a fixed number n of final 
vertices (i.e., vertices with no daughters). The resulting space T n is CAT(0). Phylogenetic 
trees are constructed from the final vertices to the root by successively grouping subsets of 
vertices as in the Kingman's coalescent. Each vertex with descendants represents the most 
recent common ancestor of the descendants, and the length of the edge (v, v') represent the 
time a group of species represented by v' needed to split. In our space the trees can have 
variable number of final vertices; our counterexample does not apply to spaces of trees 
with fixed number of final vertices. Another difference with phylogenetic trees is that in 
our space we do not label the (final) vertices. 



8 Final remarks 

Our motivation was to produce a statistical tool to study the asymptotic behavior of 
sequences of random trees. The law of large numbers is not directly applied to construct 
the tests, but is important to guarantee the consistency of the estimators. On the other 
hand, the central limit theorem (Theorem l5.3l) uses the tree structure and a particular form 
of the distance. The shape of the function </> intervening in the distance was necessary to 
show that the majorizing measure condition holds (Lemma 15. 2p . We believe this can be 
extended to other structures contained in a subset of {0,1 } s for S countable. Another 
possible extension is to eliminate the upperbound m on the number of offsprings. If the 
mean number of offsprings is not finite, then the limits may be stable laws, but this is to 
be stablished. 

The statistical application we have considered in Section 7 points in the direction of a 
Kolmogorov-Smirnov type goodness of fit test. We are interested in the decision problem: 
given a random sample T 1; ..., T n can we decide if their underlying common distribution P is 
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a given PqI For instance does the sample follow the Galton- Watson model with parameter 
Po? We think that the statistic given in section 7 is adequate for this problem. The results 
in Busch et al (2006) where our test has been applied to several simulated examples, and 
a real data example to classify FGF protein families points in this direction. 

The implementation of the tests requires the computation of the statistic ( 16. II) which 
is a supremum over the space of trees of the distance of the tree to the mean tree. The 
computation time of this task may increase fast with the number of nodes. Busch et al 
(2006) propose a method to transform this problem in the computation of the minimal cut 
of the flux of a related graph. This allows to see the behavior of the test in some concrete 
examples related to Galton- Watson generated random trees. 
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